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ABSTRACT 


Details  are  given  of  further  results  which  have  been  obtained 
for  thei  Log-variance  test  applied  to  the  Analysis  of  Variance 
of  Variances j  an  alternative  method,  the  Log-range  test,  is 
proposed.  Transformations  in  the  Analysis  of  Variance  are 
discussed,  and  a  test  is  proposed  for  deciding  whether  or  not 
to  transform  the  data.  Finally,  investigation  into  the  problems 
when  the  sample  variates  are  not  independent  is  mentioned.  The 
topics  included  in  this  discussion  are  (i)  transformation  of 
the  variates,  and  (ii)  effect  upon  the  distribution  of  sample 
range. 
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INTRODUCTION 


This  report  is  concerned  with  work  on  a  project  having  the 
general  title  "Research  on  Transformations  in  the  Analysis  of  Variance". 
The  project  was  initiated  with  the  purpose  of  investigating  the  following 
topics  : 

1.  The  theory  of  various  transformations  in  the  Analysis  of 
Variance,  including:  square-root,  logarithmic  and  reciprocal.  This 
investigation  shall  include  a  study  of  objective  tests  which  the  experi¬ 
menter  might  employ  as  criteria  for  determining  the  type  of  transformation 
to  use,  and  of  the  possibility  of  devising  better  tests.  Consideration 
shall  be  given  to  the  effects  on  the  final  interpretation  of  the  data  of 
(a)  applying  an  unnecessary  transformation  and  (b)  failure  to  apply  a 
necessary  transformation. 

2.  The  relative  importance  of  homogeneity  of  variance  and  of 
additivity  of  effects  in  the  Analysis  of  Variance.  This  investigation 
shall  be  directed  toward  answering  the  questions  as  to  whether,  prior 
to  performing  an  Analysis  of  Variance,  one  should  transform  the  data  so 
as  to  (i)  equalize  the  variances,  (ii)  minimize  the  ratio  of  the  mean 
square  for  Tukeyrs  one  degree  of  freedom  for  non-additivity  to  the 
residual  mean  square,  or  (iii)  endeavor  to  make  non-significant  the 
departures  from  both  homogeneity  and  additivity. 

3.  The  procedure  for  the  Analysis  of  Variance  applied  to 
variances  with  particular  attention  given  to  the  transformation  re¬ 
quired,  to  the  optimum  division  of  the  observations  into  subgroups,  and 
to  the  power  of  the  resulting  tests.  In  particular,  a  Monte  Carlo  study 
shall  be  made  of  the  Analysis  of  Variance  of  log  s2  for  samples  of  a 
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particular  size,  subdivided  in  various  ways. 

4.  The  best  procedure  for  the  Analysis  of  Variance  of  attributes 
data  (binomially  distributed).  Consideration  shall  be  given  to  the  rela¬ 
tive  merits  of  logit,  probit,  and  anglit  transformations.  A  comparison 

of  factorial  chi-square  tests  and  conventional  F-tests  shall  be  made.  The 
effects  of  the  transformation  on  the  data  source  will  be  emphasized  in  this 
investigation,  rather  than  the  effects  on  the  data  themselves. 

5.  Procedures  designed  to  produce  normality.  In  particular, 
consideration  shall  be  given  to  (i)  the  transformation  to  standard  nor¬ 
mal  scores  and  (ii)  procedures  which  assume  that  y  »  (x  c )*  is  normally 
distributed  and  estimate  c  and  p  by  (a)  the  method  of  moments  and  by 

(b)  the  method  of  maximum  likelihood# 

Several  Technical  Reports  and  Notes  have  been  issued  previously. 

The  first  report  [29]  considered  general  problems  of  transformations  in 
the  Analysis  of  Variance,  whereas  [30]  and  [34]  concentrated  mainly  upon 
the  logarithmic  and  square-root  transformations  (Topic  1).  The  Analysis 
of  Variance  of  Attributes  Data  (Topic  4)  was  discussed  in  [28],  and  one 
note  [26]  has  so  far  been  issued  on  the  Analysis  of  Variance  of  Variances 
(Topic  3). 

The  present  report  describes  research  that  has  been  carried  out  on 
this  project  since  the  above  mentioned  Technical  Reports  and  Notes  were  issued. 
The  work  discusseci  here  consists  mainly  of  further  research  into  the  problem  of 
Analysis  of  Variance  of  Variances,  together  with  a  section  given  to  the  prob¬ 
lem  of  deciding  whether  or  not  to  transform  the  data  before  carrying  out  an 
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Analysis  of  Variance.  It  is  hoped  that  separate,  and  more  detailed, 
Technical  Documentary  Reports  based  on  these  investigations  will  be 
issued  shortly. 
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1.  THE  ANALYSIS  0?  VARIANCE  OF  VARIANCES. 


1*1.  The  Problem  of  Heterogeneity  of  Variance.  The  standard  one-way 

Analysis  of  Variance  model  for  means,  with  equal  sample  sizes,  has 
the  form 


ykJ  "/^k  *  ekJ 


(k  -  1,2,...,K) 


...(1.1) 


where  ykj  is  the  j-th  observation  from  the  k-th  population, 

A  is  the  true  mean  of  the  k-th  population,  and 

is  a  random  variable  with  mean  zero  and  variance  . 

J  is  the  number  of  observations  taken  from  each  of  K  populations. 

The  decision  to  accept  or  reject  the  null  hypothesis,  H.  :  ■  Jl> 

( jk.  unspecified)  for  all  k,  at  a  significance  level  a,  is  made  by  com¬ 
paring  the  magnitude  of  the  F-ratio  (that  is,  the  retio  of  the  Between 
Groups  Mean  Square  to  the  Vithin  Groups  Mean  Square)  with  a  pre¬ 
assigned  significance  point  F  .  Now  the  calculation  of  the  distribution 

a 

of  the  F-ratio,  and  hence  of  the  significance  points  F^,  depends  upon 
the  assumption  that  e^j,  in  the  above  model,  are  normal  independent 
deviates  with  zero  mean  and  common  variance  <y*2  . 

It  is  therefore  necessary  to  be  able  to  test  this  assumption 
before  placing  reliance  upon  results  that  may  be  obtained  from  the 
Analysis  of  Variance. 

A  previous  report  entitled  "Notes  on  the  Analysis  of  Variance 
of  Logarithms  of  Variances"  [26]  described  a  procedure  for  testing 
whether  the  variances  of  are  equal  for  all  of  the  K 


populations*  A  summary  of  this  report  is  included  below  in  order 
that  the  more  recent  work  described  may  be  readily  followed* 

1*2*  The  Analysis  of  Variance  of  Logarithms  of  Variances*  It  has  been 
suggested  (Box:  Biometrika,  40,  1953,  PP*  318-335  C  5])  that  the 
Bartlett  test  for  the  equality  of  variances  is  very  sensitive  to 
departure  froei  normality  as  well  as  to  the  heterogeneity  of 
variance*  On  the  other  hand,  the  F-test  obtained  in  the  Analysis 
of  Variance  for  means  is  relatively  "robust'1  with  respect  to  de¬ 
partures  from  normality  per  se,  at  least  for  the  case  with  equal 
sample  sizes,  whilst  it  is  affected  seriously  by  variance 
heterogeneity.  Thus  it  was  desired  to  obtain  a  test  that  would 
be  far  less  dependent  upon  the  normality  assumption  than  i3  the 
Bartlett  test* 

The  procedure  proposed  is  to  divide  the  observations  within 
each  group  into  subgroups,  apply  a  logarithmic  transformation  to 
the  subgroup  variances,  and  then  to  perform  an  Analysis  of  Variance 
on  the  logarithms.  An  example  of  the  method  is  worked,  and  a 

Justification  of  the  procedure  is  detailed* 

The  method  is  as  follows: 

1)  Divide  the  observations  y^j  within  each  of  the  K  groups  into 
M  subgroups  of  size  A  (KA  ■  J,  M  >  1,  A  >  1),  the  division 
being  carried  out  by  a  randomizing  procedure* 

2)  Denote  by  xVim  the  a-th  observation  in  the  m-th  subgroup  of  the 
k-th  group*  Then  calculate  the  sum  of  the  squares  of  deviations 
from  the  mean  for  each  subgroup,  i*e*,  calculate 
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25  ()w^».>2 


where 


*kra. 


21 WA  u 

a-1 


the  mean  of  the  data  in  the  m-th  subgroup  of  the  k-th  group* 

3)  Calculate  the  logarithm  of  the  above  sum  of  squares,  call 

this  variate  &•  Thus 
ran 

A 

^  -  log  lo  C  21  )2]  .  ...(1.2) 

a*»l 


U) 


Carry  out  the  standard  Analysis  of  Variance  technique  on  the 
Then  if  we  denote  the  between  Groups  Sum  of  Squares  by 
and  the  Within  Groups  Sum  of  Squares  by  s£,  the  F-ratio,  F^t 
is  obtained,  where 


S*/(K-1) 

PL  "  Sj/K(M-1) 

Clearly,  since  the  are  not  distributed  normally,  the  dis¬ 
tribution  of  will  not  be  exactly  that  of  a  "normal  theory"  F. 
However,  several  approximations  were  considered,  and  approximate 
percentage  points  obtained. 

Since  the  technical  note  [26],  summarized  above,  was  issued 
further  study  has  been  given  to  the  problem.  This  will  now  be 
described  below. 
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The  power  of  a  statistical  test  is  defined  as  the  probability  of 
rejecting  the  null  hypothesis  when  it  is  false)  that  is,  the 
probability  of  reaching  a  correct  decision  when  the  null  hypothesis 
is  not  true*  Thus  if  we  denote  the  null  hypothesis  by  H#,  and  any 
given  alternative  by  H* ,  then  the  power  with  respect  to  H*  is 
given  by 

Power  ■  rejecting  H.|h*  is  truej  * 

It  was  not  possible  to  obtain  the  exact  distribution  of  F^ 
either  under  the  null  hypothesis  or  under  any  alternative.  Thus, 
in  order  to  investigate  the  power  of  the  test,  it  was  necessary  to 
approximate  the  distribution.  Now  the  quantity  P  K  > 
represents  the  probability  of  Type  I  error,  when  this  probability 
is  nominally  «.  Clearly  the  probability  will  not  in  general  be 
exactly  a,  since  F^  is  not  distributed  exactly  as  F.  Similarly, 
P^F^  >  F^jh'Jwill  give  an  approximation  to  the  power  of  the 
test* 

The  investigation  of  the  power  of  the  test  was  confined  to 
consideration  of  the  two  following  forms  of  alternative: 

a)  The  variance  of  one  of  the  K  populations  is  equal  to 
j^ff2.  The  variance  of  each  of  the  remaining  K-l 
populations  is  equal  to  F2. 
bl)  For  K  even,  the  variance  of  K/2  of  the  populations 
is  equal  to  *'o*.  The  variance  of  each  of  the  re¬ 
maining  K/2  populations  is  equal  to  <T2/0i 
b2)  For  K  odd,  the  variance  of  each  of  (K-l)/2  of  the 
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populations  is  0<r2.  The  variance  of  each  of  (K-l)/2 
populations  is  equal  to  <Ta/0»  The  variance  of  the  re¬ 
maining  population  is  f r2. 

Approximations  to  the  power  curves  have  been  calculated  for 
K  ■  2,  5*  10,  and  15  with  sample  sizes  of  12  and  24.  The  samples 
have  been  divided  into  subgroups  of  all  possible  equal  sises 
in  order  to  ascertain  which  form  of  subdivision  results  in  the 
greatest  power  being  obtained.  (See  Tables  1  -  12)  The  method  of 
approximation  used  was  that  suggested  by  David  and  Johnson,  [11] 
and  [10].  The  frequency  curves  used  for  this  approximation  were: 

K  -  2,  K  •  5*  Pearson  Type  IV  curve  (except  for  the 

underlined  values  which  were  calculated 
from  a  Pearson  Type  VI  curve) 

K  -  10,  K  «  15  :  Edgeworth  series  (using  the  first  four  terns). 
For  K  -  2,  K  ■  5  some  values  are  omitted  from  the  Tables,  the  reason 
being  that  for  these  cases  computing  time  would  have  been  too  great* 
The  values  obtained  were  usually  accurate  to  two  decimal  places  at 
least,  and,  since  this  degree  of  accuracy  is  sufficient  for  making 
the  necessary  power  comparisons  in  this  investigation,  all  the 
values  have  been  listed  to  two  decimal  places* 


TABLES  1-12* 

Power  (approximate)  of  the  Analysis  of  Variance  of  Logarithms  of 
Variances  Test  when  the  nominal  significance  level  is  5  per  cent 
and  when  there  are  K  groups  each  containing  M  subgroups  of  size 
A,  for  various  values  of 


*  The  notation  used  to  head  the  Tables  will  be: 

p{Fi.  >  f.o5K>  k«k*} 
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TABLE  1 

PK  >  f.osK»  k  - 2  •  “  * 12 1 

Alternatives  of  Type  (a) 


\(M,A) 

(6,2) 

(4,3) 

(3,4) 

(2,6) 

1.0  ' 

.05 

•91 

.0 4 

1.5 

.08 

.09 

.02 

.12 

2.0 

.11 

.14 

2.5 

.14 

.19 

.20 

•11 

3.0 

.17 

.24 

.25 

.20 

3.5 

.20 

.28 

.29 

.24 

4.0 

.23 

.32 

.33 

.27 

4.5 

.25 

.36 

.37 

.22 

5.0 

.28 

.39 

.40 

5.5 

6.0 

.30 

.32 

.42 

.45 

.44 

•46 

1 

7.0 

.36 

.50 

.52 

8.0 

.39 

.55 

.56 

•2 

9.0 

.42 

.59 

.60 

.45 

10.0 

.45 

.62 

.63 

.47 

12.0 

.50 

.67 

.69 

•-5i 

14.0 

.54 

.72 

.73 

.55 

16.0 

.57 

.75 

.76 

•  58 

18.0 

.60 

.78 

.79 

•22. 

20.0 

.63 

.00 

.81 

•6£ 
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TAIiLE  2 


p{fl  >  K  -  2  ,  MA  -  24} 

Alternatives  of  Type  (a) 


>’a: 

1  (12,2) 

(8,3) 

(6,4) 

(4,6) 

(3,8) 

(2,12) 

\ 

1.0 

.06 

.06 

.06 

.06 

•SI 

.06 

1.5 

.09 

.11 

.12 

.13 

.it 

.12 

2.0 

.14 

.21 

.24 

.25 

.25 

.20 

2.5 

.20 

.31 

.36 

.38 

.35 

3.0 

.26 

.41 

.48 

.50 

.46 

.33 

3.5 

.32 

.50 

.57 

.59 

.55 

.38 

4.0 

.37 

.57 

.65 

.67 

.62 

.43 

4.5 

U1 

.63 

.71 

.73 

.69 

.47 

5.0 

•46 

.68 

.75 

.78 

.74 

.51 

5.5 

.49 

.72 

.79 

•82 

.78 

.54 

6.0 

.53 

.75 

.82 

.85 

.81 

.57 

7.0 

.59 

.81 

.87 

.89 

.86 

.62 

8.0 

.63 

.85 

.91 

.92 

.90 

.66 

9.0 

.67 

.88 

.93 

.94 

.92 

.70 

10.0 

.71 

.90 

.95 

.96 

.94 

.73 

12.0 

.76 

.93 

.97 

.97 

.96 

.78 

14.0 

.80 

.95 

.98 

.98 

.97 

.82 

16.0 

.83 

.97 

.98 

.99 

.98 

.85 

18.0 

.86 

.97 

.99 

.99 

.99 

.87 

20.0 

.87 

.98 

.99 

.99 

.99 

.89 

11 


TAHL1&  2 


fir i  >  r.oM> 


Alternatives 

V,A) 

(6,2) 

\ 

1.0 

.05 

1.5 

.06 

2.0 

.07 

2.5 

.09 

3.0 

.11 

3.5 

.]* 

4.0 

• 

4.5 

5.0 

•20 

5.5 

6.0 

7.0 

8.0 

9.0 

10.0 

.39 

12.0 

14.0 

16.0 

18.0 

20.0 

.60 

K  -  5  ,  MA  -  12  j 
of  Type  (a) 


4,3) 

(3,4) 

(2,6) 

.06 

.07 

•21 

.10 

.11 

.10 

.13 

.15 

•Hl 

.17 

.19 

.22 

.21 

.24 

.25 

.29 

•*5 

.30 

.33 

.29 

.34 

.37 

.32 

.37 

.41 

.35 

.a 

.45 

.38 

.47 

.52 

•44 

.53 

.58 

.49 

.58 

.63 

.53 

.62 

.67 

.57 

.68 

.73 

•63 

.73 

.78 

.69 

.77 

.82 

.73 

.80 

.85 

•76 

.83 

.87 

.79 
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TABL3  4 


P{FL>F.05l'*''  K  -  5  ,  !«  -  24  J 

Alternatives  of  Type  (a) 


X*A) 

(12,2) 

(8,3) 

(6,4) 

(4,6) 

(3,8) 

(2,12) 

1.0 

.05 

.05 

.05 

.05 

.05 

1.5 

•06 

.08 

.09 

.10 

.10 

.10 

2.0 

.10 

.15 

.19 

.a 

.20 

.15 

2.5 

.14 

.25 

.31 

.34 

.33 

.2? 

3.0 

.20 

.36 

.44 

.48 

.46 

.35 

3.5 

.25 

.45 

.55 

.60 

.57 

.43 

4.0 

.30 

.54 

.64 

.69 

.67 

.51 

4.5 

.61 

.71 

.76 

.74 

.58 

5.0 

•40 

.67 

.77 

.81 

.80 

.64 

5.5 

.72 

.86 

.84 

.69 

6.0 

.76 

.89 

.87 

.73 

7.0 

.83 

.93 

.92 

.80 

8.0 

.95 

.95 

.85 

9.0 

.97 

.96 

.89 

10.0 

.63 

.98 

.98 

.91 

12.0 

.99 

.99 

.94 

14.  0 

.99 

.99 

.96 

16.0 

1.00 

1.00 

.98 

13.0 

1.00 

1.00 

.98 

20.0 

.96 

1.00 

1.00 

.99 
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TABLE  5 


>{ri >  f.o5K*  k  ■ 10  *  “  ■  121 

Alternatives  of  Type  (a) 


(6,2) 

(4,3) 

(3,4) 

(2,6) 

1.0 

.05 

.04 

.04 

.05 

1.5 

.05 

.05 

.05 

.06 

2.0 

.06 

.07 

.08 

.08 

2.5 

.07 

.10 

.11 

.11 

3.0 

.09 

.13 

.15 

.15 

3.5 

.10 

.16 

.18 

.18 

4.0 

.12 

.19 

.22 

.22 

4.5 

.13 

.23 

.27 

.25 

5.0 

.15 

.26 

.31 

.29 

5.5 

.17 

.29 

.34 

.32 

6.0 

.18 

.33 

.38 

.36 

7.0 

.22 

.39 

.45 

.42 

8.0 

.25 

.44 

.51 

.48 

9.0 

.28 

.49 

.56 

.53 

10.0 

.31 

.54 

.61 

.57 

12.0 

.36 

.61 

.69 

.64 

14.0 

.41 

.67 

.75 

.70 

16.0 

.45 

.72 

.79 

.75 

16.0 

.49 

.76 

.82 

.79 

20.0 

.52 

.79 

.85 

•62 
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TABLE  6 


r(h  >  p.o;K‘  *  '  10  '  "*  *  “1 

Alternatives  of  Type  (b) 


XM) 

(6,2) 

(4,3) 

(3,4) 

(2,6) 

\ 

1,0 

.05 

.04 

.04 

.05 

1.5 

.11 

.18 

.21 

.a 

2*0 

.30 

.53 

.61 

.57 

2.5 

.52 

.81 

.87 

.83 

3.0 

.71 

.93 

.96 

.94 

3.5 

.82 

.98 

.99 

.98 

4.0 

.90 

.99 

1.00 

.99 

4.5 

.94 

1.00 

1.00 

1.00 

5.0 

.96 

1.00 

1.00 

1.00 
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QQ<£vfvC  OCVj^oO 
O  O  nS  0&  VjJ  -J  ►*  Vi 


1.0 

1.5 

2.0 

2.5 
3.0 

3.5 
4.0 

4.5 
5.0 

5.5 
6.0 
7.0 
8.0 
9.0 

.0 
.0 
.0 
16.0 
18.0 
20.0 


TABLiil  2 

p?fL  >  p.05|<*'*  K  ■ 10  *  “  -  21>1 

Alternatives  of  Type  (a) 


(12,2) 

(8,3) 

(6,4) 

(4,6) 

(3,8) 

(2,12) 

.05 

.05 

.05 

.05 

.05 

.05 

.06 

.07 

.08 

.06 

.08 

.08 

.08 

.12 

.15 

.16 

.16 

.14 

.11 

.19 

.25 

.27 

.27 

.23 

.15 

.28 

.36 

.40 

.40 

.32 

.19 

.37 

.46 

.52 

.51 

.42 

.23 

.45 

.56 

.62 

.61 

.51 

.27 

.53 

.64 

.70 

.70 

.58 

.31 

.59 

.71 

.77 

.76 

.65 

.35 

.65 

.76 

.82 

.61 

.71 

.39 

.70 

.81 

.86 

.85 

.76 

.46 

.78 

.87 

.91 

.91 

.83 

.52 

.83 

.91 

.95 

.94 

.88 

.57 

.87 

.94 

.97 

.96 

.91 

.62 

.90 

.96 

.98 

to 

. 

.94 

.70 

.94 

.98 

.99 

.99 

.96 

.75 

.96 

.99 

.99 

.99 

.98 

.80 

.97 

.99 

1.00 

1.00 

.99 

.83 

.98 

1.00 

1.00 

1.00 

.99 

.86 

.99 

1.00 

1.00 

1.00 

1.00 
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TABLE  6 


>  '.o5K‘ 

Alternatives 


\m,A) 

(12,2) 

(8,3) 

\ 

1.00 

.05 

.05 

1.05 

.05 

.05 

1.10 

.06 

.06 

1.20 

.07 

.10 

1.30 

.10 

.17 

1.40 

.15 

.28 

1.50 

.21 

.42 

1.60 

.28 

.56 

1.70 

.37 

.68 

1.80 

.45 

.78 

1.90 

.54 

.86 

2.00 

•62 

.91 

2.20 

.75 

.97 

2.50 

.99 

3.00 

1.00 

K  -  10  ,  MA  -  24 J 
of  Type  (b) 


(6,4) 

(4,6) 

(3,8) 

(2,12) 

.05 

.05 

.05 

.05 

.06 

.05 

.05 

.06 

.07 

.07 

.07 

.07 

.12 

.13 

.13 

.12 

.22 

.25 

.25 

.21 

.36 

.41 

.41 

.34 

.53 

.59 

.58 

.48 

.68 

.74 

.73 

.62 

.80 

.85 

.85 

.75 

.88 

.92 

.92 

.84 

.94 

.96 

.96 

.90 

.97 

.98 

.98 

.94 

.99 

1.00 

1.00 

.98 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 
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TABLE  9 


PK  >  f.05K»  t  -  15,  M*  -  12] 
Alternatives  of  Type  (a) 


(6,2) 

(4,3) 

(3,4) 

(2,6) 

\ 

1.0 

.05 

.05 

.05 

.05 

1.5 

.05 

.06 

.06 

.05 

2.0 

.06 

.07 

.07 

.0? 

2.5 

.07 

.09 

.10 

.10 

3.0 

.08 

.11 

.13 

.12 

3.5 

.09 

.14 

.16 

.15 

4.0 

.10 

.17 

.19 

.18 

4.5 

.11 

.19 

.23 

.21 

5.0 

.12 

.22 

.26 

.24 

5.5 

.14 

.25 

.30 

.28 

6.0 

.15 

.28 

.33 

.31 

7.0 

.17 

.33 

.39 

.36 

8.0 

.20 

.39 

.45 

.42 

9.0 

.22 

.43 

.51 

.47 

10.0 

.25 

.48 

.55 

.51 

12.0 

.29 

.55 

.63 

.59 

14.0 

.33 

.61 

.70 

.66 

16.0 

.37 

.67 

.75 

.71 

18.0 

.40 

.71 

.79 

.75 

20.0 

.44 

.74 

.82 

.78 
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TABLE  10 


KpL  >  p.O sK*  «  -  M  *  »  -  »J 

Alternatives  of  Type  (b) 


1.00 

1.25 

1.50 

1.75 

2.00 

2.25 

2.50 

2.75 

3.00 

3.25 
3.50 


(6,2) 


.05 

.07 


(4,3) 


.05 

.09 

.21 

.41 

.62 

.79 

.89 

.95 

.97 

.99 

.99 


(3,4) 


.05 

.09 

.25 

•48 

.71 

.86 

.94 

.97 

.99 

1.00 

1.00 


(2,6) 


.05 

.09 

.23 

.45 

.67 

.83 

.92 

.96 

.98 

.99 

1.00 
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TABLE  11 


| 


p{,'l  >  F.csK»  K  ■  15  ■  “  ’  ^*1 

Alternatives  of  Type  (a) 


N(H,A) 

(12,2) 

(8,3) 

(6,4) 

(4,6) 

(3,8) 

(2,12) 

1.0 

.06 

.05 

.05 

.05 

.05 

.05 

1.5 

.06 

.07 

.07 

.08 

.08 

.07 

2.0 

.06 

.10 

.12 

.14 

.14 

.12 

2.5 

.11 

.16 

.20 

.24 

.24 

.19 

‘-•y 

3.0 

3.5 

L.O 

.14 

.23 

.29 

.35 

.35 

.28 

.17 

.30 

.39 

.46 

.46 

.37 

.21 

.38 

.48 

.56 

.56 

.45 

4.5 

.24 

.45 

.56 

.65 

•64 

.53 

5.0 

.28 

.51 

.64 

.72 

.72 

.60 

5.5 

6.0 

.31 

.57 

.70 

.78 

.77 

.66 

.35 

.63 

.75 

.82 

.82 

.72 

7.0 

•a 

.71 

.83 

.89 

.89 

.80 

8.0 

.47 

.78 

.88 

.93 

.93 

.86 

9.0 

.52 

.83 

.91 

.95 

.95 

.90 

10.0 

.57 

.87 

.94 

.97 

.97 

.93 

12.0 

.65 

.92 

.97 

.99 

.99 

.96 

14.0 

.71 

.95 

.98 

.99 

.99 

.98 

16.0 

.76 

.96 

.99 

1.00 

1.00 

.99 

18.0 

.80 

.97 

.99 

1.00 

1.00 

.99 

20.0 

.83 

.98 

1.00 

1.00 

1.00 

1.00 

20 


TA13LK  12 

ri\  >  ^,05!^**  *  - 15 ,  H*  -  2i.} 

Alternatives  of  Type  (b) 


Nxm) 

(12,2) 

(8,3) 

(6,4) 

(4,6) 

(3,8) 

(2,12) 

1.00 

.06 

.05 

.05 

.05 

.05 

.05 

1,25 

.14 

.18 

.21 

.21 

.17 

1.50 

.48 

.60 

.69 

.69 

.57 

1.75 

.82 

.91 

.95 

.95 

.89 

2.00 

.96 

.99 

1.00 

1.00 

.98 

2.25 

.99 

1.00 

1.00 

1.00 

1.00 

2.50 

.95 

1.00 

1.00 

1.00 

1.00 

1.00 

I 

i 


21 


1*4*  The  Choice  of  Subgroup  Size.  One  of  the  purposes  of  this  investi¬ 
gation  was  to  determine  the  optimum  choice  of  subgroup  size.  It  is 
desirable  to  make  the  choice  in  such  a  way  that  the  resulting  test 
is  "best"  in  the  following  sense t 

Let  T  (M  ,A  )  denote  the  test  procedure  generated  by 
c  c  c 

choosing  M_  subgroups  of  size  A  , 
c  c 

where  K  A  -  J 
c  c 

c  —  1,2,...,C 

and  where  C  is  the  total  number  of  possible  subdivisions.  Denote 

by  a  the  probability  of  Type  I  error  for  the  test  procedure  T  , 
c  c 

when  the  nominal  probability  of  such  error  is  a. 

Let  P  (0f)  denote  the  power  of  T  against  a  particular  alternative 
c  c 

determined  by  0' . 

Then  we  shall  say  that  T  , (M  , ,A  . )  is  best  if 

C  C"  c 

•c<  £  “c 

and  rc,(P)  >PC(0') 

for  all  c  ■  1,2,.. ,,c*  -  1,  c*  ♦  1,...,C,  and  for  all  0*. 

Since  the  results  given  in  1.3  are  numerical  in  nature, 
and  only  two  different  alternatives  have  been  considered,  it  will 
not  be  possible  to  say  that  one  particular  mode  of  subdivision 
is  the  best  in  the  above  sense.  We  can,  however,  make  some  general 
remarks  concerning  these  results. 

It  is  interesting  to  note  that  corresponding  to  an  in¬ 
crease  in  subgroup  size,  or  to  an  increase  in  the  number  of  sub¬ 
groups,  there  is  am  increase  in  power.  The  former  effect  is  due 
to  the  fact  that  the  variance  of  the  variate  z  decreases  as  the 
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subgroup  size  increases.  The  latter  effect  is  due  to  the  fact  that 
the  number  of  degrees  of  freedom  in  the  denominator  of  the  F-ratio 
increases  as  the  number  of  subgroups  is  increased.  These  effects 
may  be  seen  to  hold  true  for  the  cases  considered  numerically  in 
1.3,  and  will  be  shown  to  be  true  asymptotically  in  section  1.7 
below.  However,  in  the  problem  under  consideration,  the  total 
sample  3ize  is  constant,  thus  an  increase  in  subgroup  size  must 
be  accompanied  by  a  decrease  in  the  number  of  subgroups,  and 
vice  versa.  It  is  therefore  necessary  to  balance  these  two  effects 
in  the  most  advantageous  manner. 

On  the  basis  of  the  numerical  results,  it  would  appear 
that  for  the  cases  considered  in  Tables  1  -  12  we  should  make  the 
following  subdivisions: 

(i)  For  samples  of  size  12:  3  subgroups  of  size  4, 

(ii)  For  samples  of  size  24:  4  subgroups  of  size  6, 

or  3  subgroups  of  size  8. 
However  it  might  be  dangerous  to  formulate  a  general  rule  from 
the  few  particular  cases  considered  here. 

1.5*  Comparison  of  Power  with  Standard  F-test.  It  has  been  suggested 
(see  ScheffS  [36],  p.  86)  that  the  power  of  the  Analysis  of 
Variance  of  Logarithms  of  Variances  test,  which  we  shall  hereafter 
refer  to  as  the  L.  V.  test,  can  be  computed  by  noting  that  has 
approximately  a  non-central  F-di3tribution.  Therefore,  it  is  of 
interest  to  compare  the  power  of  the  L.  V.  test  computed  in  1.3 
with  the  power  computed  by  assuming  that  z^  is  exactly  normally 
distributed  and  using  Tang's  Tables  [38]  or,  equivalently,  the 


Pearson  and  Hartley  charts  [31]  for  the  power  of  the  F-test. 

It  ia  convenient  to  rewrite  (1.2)  (the  logarithms  to  the 
base  i o  may  be  replaced  by  natural  logarithms,  since  this  merely 
alters  the  value  of  by  a  constant  multiplier  )  as 

•km  _  ta  JT  *  ta8k 

where  ^  SL  <=W  -  *km.>' 

8FX 

and  where  n  •  A  -  1  . 


Then 


zkm 


♦  lntf*  ♦  [In  ^  -  E(ln  22^.)] 
K  ®*k  ®k 


-  ♦  Pk  ♦  ej^ 


•  • »(1»A) 


where 


ECln  0^  +  * 


K 

X  InO’J 

k-1 


Pk 


and  efc» 


mat  -  jf  X 

k  a  k-i  K 
[In  -  E(ln  ?4)]  . 

<  K 


It  follows  that 


«'4m>  "  0 


•••(1*5) 


...(1.6) 


and  that 


2  k 


...(1.7) 


S  Pk‘°- 

The  hypothesis  H«  of  variance  homogeneity  and  the  alternative 
Hi  nay  now  be  written,  in  terms  of  the  above  notation,  as 

K 


H.  t  Z1 
k-1 

<’>/  ■ 0 

...(1.8) 

K 

Hi  t  21 
k-1 

(Pkr  t  o  . 

•••(1.9) 

It  is  easily  shown  that  the  expectations  of  the  mean 
squares  which  occur  in  the  F^-ratio,  (1.3)#  are 

M  JL 

E[Sjj/(K-l)]  -  0*2  ♦  (Pk)2  ...(1.10) 

ECsgA(M-l)]  -  0"l  ...(1.11) 

where  -  Var  [In  s2J  for  subgroups  of  size  A. 

It  is  well  known  that  when  has  a  normal  distribution, 
the  ratio  of  mean-squares  is  distributed,  under  the  alternative 
hypothesis,  as  a  non-central  F9 


FK-1,  K(K-l)^ 

where  the  square  of  the  non-centrality  parameter  6  is  given  by 
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...(1.12) 


6*  -  A  r  (pk>2 

ffL  k-1  k 

Tables  of  P^P^,*^  >  Fa^  ^  given  by  Tang  [38]  and 
charts  are  given  by  Pearson  and  Hartley  [31],  as  mentioned  above. 
In  Tang's  notation,  we  introduce  a  quantity  0,  as  a  measure  of 
non-centrality,  where 


...(1.13) 


For  alternatives  of  "single  slippage"  [Type  (a)]  it  is  easily 
shewn  that 


r  do*  -  ta* 

k-1  15 


fron  which  it  is  seen  that 


...(1.14) 


whence 


In  0* 
In  0* 


...(1.15) 

...(1.16) 


For  alternatives  of  "equal  multiple  slippage"  [Type  (b)]  it  can 
similarly  be  shown  that 


(i)  For  K  even. 


and 


...(1.17) 
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(ii)  For  K  odd. 


and 


...(1.18) 


Table  13  compares  the  power  of  the  L.V.  test  computed  by 

the  method  of  David  and  Johnson  with  the  power  computed  from  Tang's 

tables.  For  K  ■  2,  the  power  is  given  for  3  subgroups  of  size  4. 

For  K  ■  5»  the  comparison  is  given  for  4  subgroups  of  size  6. 

TABLE  13 

Comparison  of  two  methods  of  computi ng  the  power  of  the  L. V.  test 
Power  when  the  nominal  significance  level  is  .05. 


K-2  1  K-5 


0 

David/Johnson 

Tang 

David/Johnson 

Tang 

1.0 

.24 

.20 

.31 

.30 

1.5 

.42 

.37 

.63 

cv 

NO 

. 

2.0 

.60 

.57 

.8 8 

.88 

2.5 

.75 

.75 

.98 

.98 

It  should  be  noted  that  the  results  obtained  from  Tang's  tables  were  found 
under  the  assumption  that  the  are  normally  distributed.  Thus 
the  differences  observed  between  the  two  methods  of  computation 
in  Table  13  illustrate  the  effect  of  the  ej^  being,  in  fact,  not 
normally  distributed.  It  will  be  noticed  that  the  effect  of 
non-normality  is  greater  for  K  ■  2  than  it  is  for  K  -  5»  In  fact, 
it  will  be  seen  in  Section  1.7  that  as  either  the  number  of  sub¬ 
groups,  or  the  subgroup  size,  is  increased  the  distribution  of  the 
test  statistic  becomes  nearer  to  that  which  would  be  obtained 
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under  normal  theory, 


1,6.  Pumn&rt eon  of  the  Power  of  the  L.V.  Test  with  that  of  Other  Tests 
of  H™*y>geneity.  One  of  the  standard  tests  for  equality  of 
variances  is  Bartlett's  test.  Box  [5  ]  has  shown  that  this  test 
is  very  sensitive  to  the  assumption  that  the  sampled  population 
is  normal.  Indeed  it  has  been  suggested  that  the  method  serves  equal¬ 
ly  well  as  a  test  for  normality  as  for  testing  equality  of  variances. 

Box  and  Andersen  [ 6  ]  have  suggested  a  modified  form  of 
Bartlett's  statistic  which  appears  to  make  the  test  more  robust. 

Their  test  statistic  is  of  the  form 

M*  ■  ...(1*19) 

1  *  Hh 

*2 


where  M  is  Bartlett's  statistic,  and  k2  >  ^  are  Fisher's  k-statistics. 

They  also  performed  a  sampling  experiment  in  order  to  com¬ 
pare  the  power  of  the  test  using  K  with  that  using  M' .  Samples  of 
size  20  were  taken  from  each  of  10  normal  populations.  The  following 
alternative  hypothesis  was  considered t 

0^  -  <r*  -  2.6  ...d.20) 
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An  alternative  procedure  has  been  suggested  by  Levene  [23]. 
His  method  is  to  apply  one  of  the  following  transformational 


T1  "lykj  '  yk.l 

T2  -  (T,)2 
T3  -  log  to  T| 

-JtT  ...(1.21) 


(The  above  transformations  are  in  the  notation  of  our  discussion. 
Levene  used  a  notations  Tj  ■  J  T2  ■  s^j  j  T^  ■  j  tjj.) 
Then,  using  T^  (i  ■  1,2,3,  or  4),  carry  out  a  one-way  Analysis  of 
Variance. 


Levene  did  an  extensive  sampling  experiment  to  estimate  the 
power  of  the  tests  using  the  above  transformations.  In  sampling 
experiments  with  K  ■  2,  and  samples  of  size  20,  T^  had  extremely 
poor  power  and,  therefore,  was  not  included  in  any  further  experi¬ 
ments.  For  K  •  10,  he  obtained  one  thousand  values  of  F  under  the 
null  hypothesis  and  under  the  hypothesis  given  by  (1.20)  above. 

In  the  present  investigation,  the  power  has  been  considered 

* 

both  for  the  L.V.  test  and  the  L.R.  test  under  the  same  alternative. 
In  this  particular  case  the  samples  of  size  20  were  each  divided  into 
4  subgroups  of  size  5,  since  this  subdivision  appeared  to  give  a 
better  power  than  any  other. 

*  L.R.  test  is  an  abbreviation  for  Log-range  test.  This  test  is 
suggested  as  an  alternative  to  the  L.V.  test,  and  is  discussed 
in  section  1.3. 
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Olds  and  Kennard  [27]  have  studied  the  control  chart  for 
ranges  an  a  tent  for  homogeneity.  They  conducted  a  relatively  small 
sampling  experiment  (25  control  charts)  for  the  hypothesis  given 
by  (1.20)  with  the  purpose  of  comparing  their  results  with  those 
of  Dox  and  Andersen, 

Table  14  gives  a  comparison  of  the  power  of  these  various 
tents  under  the  hypothesis  given  by  (1.20). 


TABLE  14 

Power  comparisons  of  some  tests  for  homogeneity 
when  the  nominal  significance  level  is  .05 


Bartlett 

Modified  Bartlett 
T, 


4 

L.V. 

fc.C.  for  Range 


Power 

.815 

.810 

.680 

.656 

.577 

.48 

.44 


As  seen  from  the  table  the  modified  Bartlett  test  does 
have  very  good  power  compared  to  that  of  the  Bartlett  test.  There 
is  a  considerable  loss  in  power  for  the  L.V.  test  and  the  control 
chart  for  ranges.  It  is  thought  that  the  L.R.  test  has  less  power 
than  the  L.V.  test*  but  this  matter  seems  to  require  further  investi¬ 
gation.  For  the  L.V.  test  the  loss  of  power  may  be  compensated  by 
the  fact  that  the  test  would  be  expected  to  be  "robust". 

The  tests  based  on  T^  have  good  power  and  have  the 
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advantage  that  they  are  easi er  to  apply  than  the  Bartlett  or 
modified  Bartlett  test.  From  a  theoretical  point  of  view  they 
are  difficult  to  investigate,  since  they  have  the  property  that, 
under  any  given  alternative  hypothesis,  the  transformed  variate 
will  have  not  only  different  means,  but  also  different  variances, 
for  each  population.  Also,  the  transformed  variates  will  no  longer 
be  independent* 

It  is  seen  that  against  the  alternative  considered,  the 
L.V.  test  and  the  L.R.  test  have  considerably  less  power  than 
either  of  the  Bartlett  tests,  or  any  of  the  Tj  tests  (J  -  1,  2,  4)* 
However,  there  is  some  evidence  to  suggest  that  all  of  these  last 
mentioned  tests  are  less  robust  than  the  L.V.  test  with  respect 
to  departures  from  underlying  normality.  It  is  clear  that  there 
is  a  need  for  further  investigation  on  this  subject. 

1,7.  Some  Asymptotic  Results  for  the  L.V.  Test.  The  asymptotic  dis¬ 
tribution  of  the  test  statistic  is  now  considered  as  K,  M,  or  A 
tends  to  infinity.  It  is  assumed  that  the  density  function 
of  the  satisfies  certain  regularity  conditions  and  has  at 
least  the  first  four  moments  finite.  The  following  definitions 
are  needed  here  (See  Scheffl  [36], p.  412): 

Definition  1.  If  X|,...,  Xy  are  normal  independent  variables 
with  having  mean  £  ^  and  unit  variance,  then  the  random  variable 

-  £  *i 

is  called  a  non-central  chi-square  variable  with  »  degrees  of 
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freedom.  The  quantity  6  ia  known  as  the  non-centrality  parameter, 
where 


It  should  be  noted  that  when  6  ia  sero  the  distribution  reduces  to 
that  of  a  central  chi-square, 

Definition  2.  If  7X  (&)  and  are  independent  random  variables 
with  distributions  aa  defined  in  Definition  1,  then  the  distribution 
of  the  ratio  of  these  quantities  divided  by  their  degrees  of 
freedom. 


"i,  - 


is  called  a  non-central  P-distribution  with  >*i  and  **2  degrees  of 
freedom,  and  non -centrality  parameter  6. 

We  shall  now  quote  some  lamas  which  will  be  useful  in 
our  discussion  of  the  limiting  distributions  of  the  test  statistic, 

Pj^*  "Distribution  function"  is  used  here  in  the  cumulative  sense. 

Lemma  1:  (Cramer  [8  ],  sec.  20.6J  Let  ut,  U2,...  be  a  sequence 

of  random  variables,  with  distribution  functions  Pi,  P2,..«  Suppose 

that  P  (*»)  tends  to  a  distribution  function  F(ac)  as  n  00. 
n 

Let  vj,  V2, • • •  be  another  sequence  of  random  variables, 
and  suppose  that  vR  converges  in  probability  to  a  constant  c. 

Let  wn  ■  \Tn»  Then  the  distribution  function  of  wR 
tends  to  P(*£/c). 
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Lew m  2t  (CramSr  [ 8 ],  sec.  27*3.)  Let  X,,  Xa,..*,  be  a 
sequence  of  independent  observations  from  a  population  with  dis¬ 
tribution  function  F(X). 


(N)  1  4r  # 

Let  m  n-  2-  X.  denote  the  t*-th  sample  moment,  and 
"  i-1  1 

(N) 

let  a*  -  E[a  »,]•  Let  g  be  any  rational  function,  or  power  of  a 
rational  function* 

If  ai,*..,^  are  finite  and  gCeti,*..,^)  is  defined, 

then 

g£a^,...,a^]  — ►  g(at,***,olc)  in  probability, 
as  N  — ►  00  * 


3:  (Due  to  Mann  and  Wald,  [2iJ,  Theorem  5*  The  following 
statement  of  the  theoroa  is  given  by  Rao  [34],  Leona  3^ 

Suppose  FjjCui,*..,!^)  is  the  distribution  function  of 

»  F,  as  N  ►  00,  at  all  continuity 
points  of  F*  If  g(Jti ,»••, is  a  Borel  measurable  function  such 
that  the  set  D(g),  of  discontinuity  points  of  g£Cj satis¬ 
fies  F£D(g)  J  -  0,  under  F,  then  the  distribution  function  fK  of 


X^)  and  that  FN 


,*•  *, 


(N)\ 

k  ' 


converges  as  N  — ►  «,  to  that  of  F  where  F  is  the  distribution 

®  O 

function  of 


g(*i 


»•••»* 


,)  • 


33 


Lfma  4:  (A  proof  is  Riven  by  Rao,  M.  M,  [34],  p.  8.)  Let  aR  be  a 
sequence  of  real  numbers  such  that  lim  a  «  a,  a  finite.  Let  P_(y) 

n-*oo  n 

be  a  sequence  of  distribution  functions  such  that  Fn  F.  Then  if 

a  is  a  continuity  point  of  F,  lim  F  (a  )  -  F(a). 

n«*a>  n  n 


Lemma  5:  (Crajrf£r  [8],  sec.  28.3)  l>et  xi,...,x^  be  a  sequence  of 
independent  observations  from  a  population  with  distribution 
function  F(x). 


Let  s2  -  —  ^  (x^  “  x)2,  where  n  -  N-l.  Then  Jn  (s2  -  <T2) 

is  asymptotically  normal  [0,<T^(2  ♦  V2)]  as  N  — ►  00,  where  is  given 
by  the  following  function  of  the  moments  of  distribution  of  x^: 


X 


2  -  Pa  *•  3 


Proof:  Cramer  proves  this  lemma  for  the  variate 
1  N 

(s')2  »  r?  (x.  “  x)2.  It  follows  from  Lemma  1  that  the  variate 

a  4-1  ^ 

N  M 

e2  -  jpj  (s')2  has  the  same  limiting  distribution,  since  yj^r-  con¬ 
verges  to  one  as  N  — ►  00  . 

Lenina  6:  (Rao,  C.  R.  [33),  p.  208.)  Let  Tjj  be  a  sequence  of  real¬ 
valued  statistics  with  the  property  that  Jn[Tjj  *•  9]  is  asymptotic¬ 
ally  normally  distributed  [C  ,  4^(0)].  Then  if  f  is  any  function 
of  Tjj,  it  follows  that  fjJ[f(T^)  -  f(9)]  is  asymptotically  normal 
[0,  (^§)2  ^9)],  provided  that  ^  is  continuous  in  the  neighborhood 
of  9. 


I 
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Lenina  7;  The  limiting  distribution  of  fl?(ln  sa  -  In  <Ta)  is  the 
normal  distribution  with  mean  zero  and  variance  2  + 

Proof;  In  Leona  6,  let  Tjj  “  ®2,  N  ■  n,  9  ■  <T2,  ^(9)  -  w  >{(2  +  ^g), 
and  f(Tjj)  ■  In  Tjj.  It  follows  from  Lenina  5  that  satisfies  the 
hypothesis  of  Lenina  6  and  the  result  follows* 

The  proof  of  the  theorem  below  follows  from  results  used 
by  Andrews  [2]  in  the  proof  of  his  Theorem  5*2* 

Theorem  It  Let  H|j  Ha,***  denote  a  sequence  of  alternative  hypotheses, 

E<M* 

and 


where  denotes  the  hypothesis  given  by  ^ 

K 

2*  (PiJ2  a  constant.  If  the  distribution  function  of  s,  say 
i-1  K 

P(z),  satisfies  the  following  conditions t 

1)  F  possesses  a  continuous  derivative  F’  except  at  most 

on  a  set  S  where  dP(*)  •  0  , 

S 

2)  There  exists  a  function  g  which  bounds  the  difference 
quotient  |  [F(z  ♦  9)  -  F(z)]^J<  g(z)  for  which  ^g(s)dF(z)  <  to  , 

3)  The  variance  of  s  is  finite, 

then 

FL  in  distribution,  as  M  — ►  »  , 

,  £(Pk)Z 

where  6  . . .  --y  -  ,  and  *  is  the  variance  of  In  a*  * 

*L  L 

In  particular,  under  the  hypothesis  of  variance  homogeneity. 


FL  10  distrlbution  • 
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Corollary:  Denote  by  ^  z  a  uPPer  a  P«rcentape  point 
of  the  F  distribution  with  and  >**  degrees  of  freedom.  Then 


lim 

M-frco 


r{h> 


•  • 


Proof t  By  definition  we  have 


dF 


»i>»a 


1  •  a  • 


...(1.22) 


By  the  preceding  theorem  we  have 


...(1.23) 


It  can  also  be  shown  that 


lim 

>>2^co 


...(1.24) 


Since  any  subsequence  of  a  convergent  sequence  also  con¬ 


verges  we  have 


PK-l,K(M-l),o  "  PK-l,«,a  . 


...(1.25) 
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We  have  shown  that  K1  is  a  sequence  of  makers  and 
that  the  distribution  functions  of  form  a  sequence  of  distribu¬ 
tion  functions  which  together  satisfy  the  hypotheses  of  Lenma  U, 
and  the  result  follows* 

Theorem  2.  If  F(z)  satisfies  the  conditions  1  -  3  of  Theorem  1,  then 
the  distribution  function  of  F^  converges  to  that  of  the  non-c«itral 
F-distribution  as  subgroup  size  tends  to  infinity*  i.e. 

F^  — ►  F k(M-1)(*)  dietritlution  as  A  — ►  oo. 


MZ(Pk)s 

"1"r*  »2 ' 


•..(1.26) 


In  particular,  under  the  hypothesis  of  variance  homogeneity, 


Fl  — ►  Fr_^  in  distribution. 

Proof i  Consider  the  variate 

*£„,  -  JT^L  [zj^  -  KU^)]  •  ...(1.27) 

We  note  that  the  distribution  of  F^  is  invariant  under  this  trans¬ 
formation.  By  Lenma  5,  z^  converges  in  distribution  to  that  of  a 
normal  variate  with  mean  zero  and  variance  (2  +  Since  the 

Bj^  are  independently  distributed  it  follows  that  their  joint  dis¬ 
tribution  function  converges  to  the  joint  distribution  function  of 
independent  normal  variates.  The  theorem  then  follows  from  an 
application  of  Lemma  3* 


37 


Corollary:  P  a  as  A  — ►  oo  • 

Proof:  The  corollary  follows  immediately  from  the  definition  of 
convergence  in  distribution  since  Fk“1,K(M-1),ci 

Corollary:  V/hen  the  sampled  population  is  non-normal,  the  asymptotic 
power  of  the  L.V.  test,  as  A  — oo,  is  $  that  which  would  be  obtained 
under  normal  sampling,  according  as  to  whether  V^2  ^  0  * 

Proof:  It  is  easily  shown  that  for  any  population  >  -  2.  For 
a  normal  population  ^2  -  0*  It  can  be  shown  that,  for  K  and  K 
fixed. 


PK-1,  K(M-l) 


(6)  >  PK-1,K(M-1),« 


i 


is  a  monotone  increasing  function  of  6, 


where 


6* 


m  3l  <ek>2 

k-l 

CT+-&T 


K 

If  we  also  consider  21  (PiJ8  fixed,  then  4*  is  a  monotone  decreasing 

k-l 

function  in  ^2  and  the  result  follows* 

Theorem  3.  Under  the  null  hypothesis  of  variance  homogeneity  P^ 
converges  to  one  in  probability  as  K  tends  to  infinity* 

Proof:  This  theorem  follows  immediately  from  Lemma  2.  It  is 
well  known  that  under  the  null  hypothesis  the  numerator  and  the 
denominator  of  the  variance  ratio  are  unbiased  estimates  of  the 
variance  of  the  sampled  variate.  Since  F^  is  the  ratio  of  two 
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quadratic  foras  it  satisfies  the  hypotheses  of  Lemma  2  and  the  re¬ 
sult  follows* 

Thus  the  asymptotic  results  which  we  have  demonstrated 


are: 


2 

(i)  Under  Hf :  Xk-1^  d*8tributi°n  48  M  ® 


where  6* 


■ft? 


Under  H«  *  F^  Xk-1  c^sbr*bub^on  as  M  ® 

U.  p{pl  >  I-J-  « 

(ii)  Under  H* :  FT  — *■  k(M-1)^  di3tr^bubi°n  a3  A  ® 

Under  H. «  F^  — ►  F^_^  £^-1)  in  distribution  as  A  — ►  oo 

PK>F.}-*- 

(iii)  Under  H#:  F^  — ►  1  in  probability  as  K  — ►  oo  . 


1.8.  The  Analysis  of  Variance  of  Logarithms  of  Ranges.  A  procedure  is 
now  proposed  that  may  be  used  as  an  alternative  to  the  L.V.  test 
for  testing  the  hypothesis  of  variance  homogeneity.  The  method 
will  be  known  as  the  log-range  test,  or  L.R.  test.  In  the  follow¬ 
ing  subsections  the  procedure  will  be  described  and  justified,  its 
use  will  be  compared  with  that  of  the  L.V.  test,  and  some  pro¬ 
perties  of  the  test  will  be  discussed  whai  the  sampled  distribution 
is  rectangular  as  well  as  when  it  is  normal. 

1.8.1.  The  Log-range  Test.  The  procedure  for  applying  the  L.R.  test  is 

directly  analogous  to  that  for  the  L.V.  test  (See  section  1.1.): 

(i)  Divide  the  J  observations  y^j  within  each  of  the  K 
groups  into  K  subgroups  of  size  A.  (MA  ■  J,  H  >  1,  A>1) 
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(11)  Calculate  the  range  of  the  observations  in  each  sub¬ 


group,  i.e.,  calculate 


In  a  previous  technical  note  (26] ,  section  2,  the  use  of 
the  logarithmic  transformation  applied  to  variances  was  justified. 
The  same  argument  will  be  repeated  here  with  regard  to  Justifica¬ 
tion  of  the  logarithmic  transformation  applied  to  range* 


where 


(1.31) 


*Han  “  t*km(k)  “  *km(l)]  * 


Now  it  is  mathematically  convenient  to  consider  the  variate  2  zj^, 
and  moreover  also  to  consider  the  logarithmic  transformation  in 
terms  of  natural  logarithms  rather  than  those  to  the  base  10* 
Clearly  the  theory  of  the  test  will  be  unaffected  by  such  changes , 
and  we  may  write 


...(1.32) 


It  follows  that 


E(zJ^)  -  E(ln  ^)  ♦  In 
-  c  ♦  In  crk2  t 


...(1.33) 

...(1.34) 


where 


R2 

c  ■  E(ln  *jir*s) 
Tk 


Equations  (1.32)  and  (1.34)  show  that  testing  the  hypothesis  of 
equality  of  means  of  the  variate  Zj^  ,  or  z^,  is  equivalent  to 
testing  the  hypothesis  of  variance  homogeneity.  It  would  be  de¬ 
sirable  for  the  transformed  variable  to  have  approximately  a 
normal  distribution.  In  section  1.7  it  was  demonstrated  that  this 
is  true  for  the  logarithm  of  variance  transformation  when  A,  the 
subgroup  size,  is  large.  However,  it  will  be  shown  in  1.6.3  below, 
in  particular,  that  when  the  sampled  population  is  rectangular,  the 
asymptotic  distribution,  as  the  sample  size  is  increased  without 
limit,  of  log-range  is  not  normal. 
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1*8.2.  The  L.  R.  Test  for  a  Normal  Population.  Some  properties  of  the  L.R, 
test  when  the  sampled  population  is  normal  will  now  be  discussed. 

It  is  well  known  that,  except  for  samples  of  size  2,  the  distribu¬ 
tion  of  range  is  expressible  only  in  integral  form.  The  limiting 
distribution  of  range  is  not  known  and  it  appears  that  the  problem 
of  finding  the  limiting  distribution  of  the  logarithm  of  the  range 
is  intractable.  It  is  possible,  however,  to  find  the  cumulants  of 
the  logarithm  of  the  range  by  numerical  methods. 

The  cumulative  distribution  of  range  has  been  computed  and 
tabled  by  Harter  and  Clenm  [17].  The  density  function  of  range 
w$s  obtained  by  numerical  differentiation  of  the  cumulative  distribu¬ 
tion  function.  Finally,  the  first  seven  cumulants  of  log-range  were 
obtained  by  numerical  integration.  The  method  was  checked  by  com¬ 
puting  the  first  four  cumulants  of  range  and  comparing  these  with 
the  tabled  results  of  Harter  and  Clenm. 

The  first  seven  standardized  cumulants  for  the  distribu¬ 
tion  of  In  R2  are  given  in  Table  15.  For  comparison  the  correspond¬ 
ing  cumulants  of  In  a2  are  also  given.  The  latter  were  computed 
from  the  tables  of  Davis  [12],  For  samples  of  size  2,  R2  ■  2s2  and 
in  this  case  E(ln  R2)  -  In  2  4-  E[ln  s2]. 

For  N  ■  2  the  cumulants  of  order  two  or  larger  are  the  same 
for  the  two  variables.  For  N  larger  than  2  the  standardized  cumulants 
of  In  R2  of  order  higher  than  two  are  closer  to  the  normal  theory  value  of 
zero  than  are  the  corresponding  cumulants  of  In  s2,  and  the  variance 
of  In  R2  is  larger  than  the  variance  of  In  sa. 
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1.8.3.  The  L.R.  Test  whan  the  Sampled  Distribution  is  Rectangular.  In 

section  1.7,  it  was  shown  that  as  the  subgroup  size  increases  the 

distribution  of  the  test  statistic  F^  converges  to  that  of  the 

normal  theory  F  or  F* ,  according  to  whether  the  null  or  an 

alternative  hypothesis  is  true.  It  will  be  demonstrated  below, 

however,  that,  in  general,  this  result  is  not  true  of  the 
# 

statistic  F^  obtained  in  the  L.R.  test.  In  particular,  it  will 
be  shown  that  when  the  sampled  population  has  a  rectangular  dis¬ 
tribution,  the  distribution  of  the  standardised  variable 


converges  to  that  of  a  linear  function  of  a  chi-square  variable 
with  four  degrees  of  freedom. 

For  this  case  it  will  also  be  shown  that  var  [z  ]  is  of 
order  l/N2,  where  N  is  the  sample,  or  subgroup,  size.  In 
section  1.7  it  was  seen  that  var  [zj  is  of  order  l/N.  Alee, it 
should  be  noted  that  whereas  for  the  L.V.  test  P  {h  >».!-►« 
as  subgroup  size  increases,  in  view  of  the  convergence 
noted  above,  this  does  not  hold  true  in  general  for  the  L.R. 
test. 

The  distribution  of  the  logarithm  of  the  range  of  a 
sample  of  size  N  drawn  at  random  from  a  rectangular  distribution 


will  now  be  considered. 


Let  x  be  a  random  variable  with  probability  density 


function 


f(x)  -  1  ,  0  <  x  <  1 

«  0  ,  elsewhere 


...(1.35) 


and  let  R  denote  the  range  of  a  sample  of  size  N,  (N  >  1),  drawn 
randomly  and  independently  from  this  population.  Then  the  density 
function  of  K  is  given  by 


g(R)  -  NfN-DR^d-R)  ,  0  <  R  <  1 
-  0  ,  elsewhere 


...(1.36) 


«  *  #  * 

If  we  let  z  *  In  Rj  then  R  -  e*  ,  and  dR  -  e*  ds  ,  thus 


»  » 


g(**)  -  H(N-l)e^N“1^*  (1-e*  )  ,  -  «  <  a*  <  0 

■  0  ,  elsewhere  •••(1.37) 


The  characteristic  function  of  s  is  then  given  by 


0j(t)  -  B(eit,W)  -  H(N-l)  e(1UII“l)*#(l-e**)d«' 


w,[ i 


-  N(N-l)\  [  exp  [-{it+N-l)z*]dz*  exp[-(it+N)m*]d**J 


-  cd  ^)(i  ♦ 


...(1.38) 
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and  so  the  cunulant  generating  function  of  z  Is  given  by 


4>^(t)  -  In  0|(t)  -  -ln(l  ♦  -  ln(l  +  ^-)  ...(1.39) 


whence  we  find  that  the  j-th  cunulant  of  a  is  given  by 


Kj  -  (-1)J  (j-l)l  CN“J  ♦  (N-1)“JJ  .  ..*(1.40) 


In  particular 


*-«■*> --ft# 

a  1  1 

and  K2  -  var  (z  )  -  ggr 


...(l.iil) 


V'e  note  also,  from  (1.3d)  that 


11a  0«*(t)  -  1  .  ...(1.42) 

Therefore,  It  follows  from  the  continuity  theorem  that  z*  converges 
in  probability  to  zero  (Cranlr  ],  sec.  10.4). 

Consider  now  the  standardized  variable 

Z*  -  *  -Jb*>  .  ...(1.43) 

O' z* 

Then  0z#(t)  -  exp  [-it  E(z*)/  and  fraa  (1.41)  it 

is  seen  that 

E<'*>/«V  -  -(2»-l)(2Xa-2»IH)*1/2 
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a 

X 


S' 

1 


and  therefore 

0z#(t)  -  exp  [it(2N-l)(2N2  -  2N+1)"1/2)  • 

[l«-it(N-l)(2Na  -  2N+1)“1/2]“1[1  ♦  itN(2N2  -  2JM-1)*’1/2]*'1, 

•••(1.44) 

It  follows  that 

0  (t)  -  lim  0z*(t)  -  exp  [  JT  it]  [  1  +  &“2  .  ...(1.45) 

*  N«^oo  *  ‘ 

where  Z.  is  the  limiting  value  of  Z  ,  as  N  — ►  00  • 

But  the  characteristic  function  of  u  ■  (a  Xi  ♦  b)  is 

0u(t)  -  eitb(l-2  ita)“V/2 

whence  it  is  seen  that  Zfl  is  distributed  as  [-  zMl  *  ™  • 

Also  it  follows  from  the  continuity  theorem  that  Z*  converges  in 

•* 

distribution  to  Z  • 

Thus  it  is  seen  that  rather  than  tending  to  normality,  std. 
log-range  tends  to  a  linear  function  of  a  chi-square  variable 
with  four  degrees  of  freedom,  when  the  sampled  distribution  is 
rectangular  [0,1]. 

However  it  is  felt  that  further  investigation  of  the 
properties  of  this  test  would  be  justified  since  its  application 
is  simpler  than  that  of  the  L.V.  test. 

] .9.  An  Alternative  Approach  to  the  Analysis  of  Variance  of  Variances. 

In  the  analysis  of  variance  of  logarithms  of  variances  the  obser¬ 
vations  are  grouped  and  transformed  as  described  in  1.2.  The  model 
thus  becomes 
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...<l .46) 


k  ■  1|«.«|K 

i  ■  ly •••  fM 


where,  under  the  assumption  that  the  original  are  normal,  the 
ej^  are  log  variables.  The  hypothesis  of  variance  homogeneity 

may  then  be  written  as 


H. 


unspecified) 


This  hypothesis  can  be  split  into  a  number  of  sub-hypotheses  which 
together  imply  it.  First  a  somewhat  weaker  hypothesis  will  be 
discussed.  Then  using  this,  the  above  more  general  hypothesis  may 
be  tested* 


Let  Cjj  be  any  given  real  numbers  such  that 
and  define  the  contrast  0,  as 


K 

ZZ  %  -  0  , 

k-1  K 


Then  we  wish  to  test  the  null  hypothesis  H.i  0-0,  against  the 
alternative.  Hi  t  0  J  0  * 

Note  that  if  0“^  -  0“^,  for  all  k,  k'  so  that  •  JJfe  , 
then  H#  is  true  for  all  c^»  but  the  converse  is  not  necessarily  true 
for  a  given  set  of  c^'s.  To  test  the  above  null  hypothesis  the 
following  procedure  may  be  applied.  Transform  (1.1*6) »  by  averaging 
on  m,  into 
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I 


V  "Ac  +  ek«  *  ...(1.47) 

Clearly  the  ,  k  ■  1, ...,](,  are  independent  identically  distribu¬ 
ted  randon  variables*  Let  the  estimated  contrast  be 

0  ■  .  ..,(1.48) 

k-1  K  K 

Since  E(c£,)  does  not  depend  on  JiL^f  it  follows  that  E(&)  -  6,  where 
9  -  0  if  H.  is  true,  and  6^0  otherwise.  Consider 

«  JS 

0  -  9  -  £.  c.  e»  .  ...(1.49) 

k-1  K  K 

Now  the  test  consists  of  rejecting  H#  t  9  -  0  if 

|§  -  ej  ,  or  rather  |e| 

is  large,  and  accepting  otherwise.  Thus  the  problem  is  solved 
if  the  distribution  of  the  random  variable  in  (1.49)  is  obtained. 
This  can  easily  be  done;  the  technical  details  will  not  be 
included  here  since  the  problem  semns  capable  of  further  extension. 

This  concludes  our  discussion  on  the  Analysis  of  Variance 
of  variances.  It  is  clear  that  certain  definite  progress  has  been 
made  towards  solving  the  problem,  but  that  there  is  still  a  need 
for  further  Investigation  into  various  aspects  of  the  methods 
proposed. 
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2. 


A  TRANSFORMATION  PROCEDURE  IN  TH  ANALYSIS  OF  VARIANCE 


2,1.  Transformations  in  the  analysis  of  Variance.  The  assumptions  under¬ 
lying  the  classical  Analysis  of  Variance  theory  are  that  the  cell 
populations  be  distributed  normally,  and  that  the  variances  within 
each  group  be  equal.  It  has  been  suggested  by  several  writers  that 
the  validity  of  the  F-test  may  not  be  seriously  affected  by  lack  of 
normality  per  se,  at  least  when  the  sample  sizes  are  equal,  but  that 
the  test  is  sensitive  to  variance  heterogeneity.  It  is  for  this 
reason  that  the  methods  discussed  in  Part  I  of  this  report  have  been 
presented.  However,  in  Part  I  it  was  assumed  that  the  data  could  be 
described  by  one  particular  mathematical  model.  The  assumptions 
made  were  that  the  variables  within  each  subgroup  were  normally  dis¬ 
tributed,  but  that  the  within  group  variance  was  not  necessarily  the 
same  for  all  the  subgroups.  Clearly  in  a  general  investigation  of 
the  problem  of  variance  heterogeneity  in  the  Analysis  of  Variance,  it 
is  necessary  to  study  other  models  also* 

A  different  hypothesis  will  now  be  considered,  namely  that 
the  variables  are  no  longer  normally  distributed,  but  that  a  trans¬ 
formation  may  be  found  that  will  transform  their  distribution  to 
normality.  It  will  be  further  assumed  that  variance  heterogeneity 
is  caused  by  there  being  a  relationship  between  the  mean  and 
standard  deviation  within  each  group.  Suitable  transformations 
for  stabxlizinp  the  variance,  although  not  necessarily  normalizing 
the  variable,  have  been  proposed  for  some  particular  cases  by 
Curtiss,  [9],  and  others  more  recently.  However,  it  appears  that 
no  method  of  testing  whether  such  a  situation  really  exists  has 
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been  put  forward.  It  is  proposed  in  this  section  to  suggest  a  pro¬ 
cedure  for  testing  whether  to  transform,  and  to  discuss  the  possibility 
of  deciding  between  different  transformations. 

The  idea  has  been  put  forward  that  the  transformation  to 
be  used  might  be  one  of  a  family  of  transformations,  such  as 
y  -  (x+-c)^,  where  y  it,  distributed  normally  (Tukey  C 39J ) • 

This  transformation  may  by  definition  be  made  to  include  the 
logarithmic  transformation  for  the  case  p  -  0.  The  problem,  then, 
becomes  one  of  estimating  the  parameters  p  and  c  from  the  data, 
carrying  out  the  transformation,  and  analysing  the  transformed 
data.  However,  the  writers  feel  that  since  the  sample  sizes  will 
generally  be  small,  and  hence  the  power  to  discriminate  between 
different  distributions  low,  it  would  be  more  advantageous  to  re¬ 
strict  the  choice  of  transformation  to  between  a  few  readily 
applied  transformations  -  square  root  and  logarithmic,  for  example. 

With  samples  of  the  sizes  generally  found  in  the  Analysis  of  Variance, 
it  is  unlikely  that  much  would  be  gained  from  considering  many 
alternative  transformations.  In  fact,  a  possible  procedure  would 
be  to  choose  a  transformation  before  testing  whether  or  not  to 
transform,  and  then  having  made  the  test,  if  it  is  decided  to 
transform,  the  chosen  transformation  would  be  applied  whatever 
the  distribution  of  the  data  mi/*it  be* 

When  there  is  a  relationship  between  the  mean  and  the 
variance  of  the  within  grovp  distributions  in  the  Analysis  of 
Variance,  it  may  be  the  result  of  the  distributions  being  log-normal 
or  Pearson  Type  III.  (For  a  discussion  of  these  frequency  distributions 
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the  render  is  referred  to  [1]  and  [133 •)  In  this  case  the  samples 
tend  to  have  many  observations  below  the  sfimple  mean,  and  relatively 


close  to  it,  while  they  often  have  fewer  readings  above  the  mean, 
many  of  them  being  a  greater  distance  from  it.  This  suggests  that 
a  suitable  transformation  should  have  the  effect  of  condensing  the 
upper  tail  of  the  distribution.  Thus  a  suitable  transformation 
might  be  the  logarithmic  transformation  or  the  square-root  trans¬ 
formation.  It  was  decided  that  throughout  this  preliminary  investi¬ 
gation  the  logarithmic  transformation  should  be  used.  Then  if  the 
true  distribution  is  log-normal  and  the  parameters  have  been  estimated 
from  the  data,  it  would  be  hoped  that  the  transformed  distribution 
would  be  approximately  normal.  Moreover,  it  may  be  seen  from 
Table  15,  page  U3,  of  this  report  (or  more  fully  in  Table  1,  page  130, 
of  [3])  and  that  Vi  and  ^2  for  the  distribution  of  log 
are  nearer  to  zero  than  the  corresponding  values  for  the  distribution 
of  (Notes  y  i  -  ^2  "  0  for  a  normal  distribution.)  Thus 

it  seems  that  if  the  data  is  in  fact  distributed  as  a  chi-square 
distribution,  rather  than  as  a  log-noimal  distribution,  the  log¬ 
arithmic  transformation  will  still  tend  to  normalize  the  data. 

Thus  it  is  proposed  that  the  following  procedure  be 


adopted: 

1.  Test  to  determine  whether  a  transformation  should  be  made. 

2.  a)  If  it  is  decided  not  to  transform,  analyze  the  original 

data. 

b)  If  it  is  decided  to  transform,  carry  out  the  logarithmic 
transformation,  and  then  analyze  the  transformed  data* 
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The  analysis  having  been  c caplet ed  on  the  transformed 
data,  it  will  be  necessary  to  interpret  the  conclusions 
in  teres  of  the  original  data. 

The  possibility  of  obtaining  a  test  based  on  the  likelihood 
ratio  criterion  was  investigated,  but  it  was  decided  that  the  result¬ 
ing  procedure  would  be  too  cumbersome  for  most  practical  purposes* 
Instead,  a  simpler  procedure  for  testing  was  sought,  even  if  its 
power  of  detecting  departure  from  the  Analysis  of  Variance  assumptions 
might  be  somewhat  less  than  that  of  a  test  based  on  the  likelihood 
ratio. 

2,2  To  test  the  null  hypothesis  that  the  data  be  normally  distributed. 

As  has  been  stated  in  2.1,  the  validity  of  the  assumption  of  homo¬ 
geneity  of  variance  in  the  Analysis  of  Variance  is  frequently 
questioned  because  there  appears  to  be  a  relationship  between  the 
mean  and  the  variance  of  the  population  distribution.  It  is  seen 
that  this  may  result  in  actually  testing  for  lack  of  normality 
within  the  samples,  rather  than  for  heterogeneity  of  variance 
between  the  samples* 

Consider  now  the  following  three  distributions  of  the  random 
variable  x,  each  of  which  has  mean  ^  ,  and  variance  (J^2  • 

1)  Nom.1  P,to  - 

The  moments  and  cumulant-ratios  of  this  distribution  are: 


exp 


-  CO  <  X  <  +  00 


...(2.1) 
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-  s 

A(x)  ■  «■** 

JU3U)  -  o 

A(x) ' 3  r* 

&<*>  -  o 

y2<x)  -  o 


ii)  Log-normal 


pl(x)  "<sr^ 


1 

x  ♦  e 


-  c  <  x  <  ♦  a> 


• ••(2*2) 


whore 


and 


Por  this  distribution  it  nay  be  shown  that: 

^(x)-  £  -«*[?♦!  <ry2i  -• 

^l2(x)  -  <J^2  -  exp  [2^cr/]  (exp  C  OJ,2]  -  1) 

^fyx)  -  exp  [3^  («*P  C  d^2J  -  l)2  (exp  [  OJ2]  +  2) 
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^(x)  -  exp  [  47^  2<J^2](exp  [  CTy2)  “D*  («P  UG^2]  +  2  exp  [3  OJ2] 

+  3  «xp  [2  o^2]  -  3) 

^(x)  -  («p  c<r7lJ  -  »1/2  (cm.  t <r/]  +  2) 

|f2(x)  -  exp  [40^*3  ♦  2  exp  [3  <Ty23  ♦  3  «p  [2  0J2]  -  6 

fit  should  be  noted  that  if  (Tx  •  A  ( S  +c) ,  where  A  is 
constant  for  all  £  ,  (Tx>  then  <Ty  is  constant*] 

iii)  Root-normal.  (The  relationship  between  this  distribution  and 
a  chi-square  with  one  degree  of  freedom  is  readily  seen.) 

Pr(x)  "  &  “*{-2(^1 2)iJ 

«  c  <  x  <  ♦  eo  •••(2*3) 

where  ^2  -  [(  5  +c)2  “  J  C f^2]1^2 

-  ( S«)  -  t(5+c)s 

and  far  this  distribution  it  is  seen  that: 

)k^x)  -  £  -  flf2  ♦  72  -  c 

"  Z(rj  +  4  7* 

JU^x)  -  8  (T 6  ♦  24  <Ty472 

^l4(x)  -  48  QTy8  +  192  ff"y6  7  2  ♦ 

K  q(x)  -  (8  ♦  (2  +  4  "^jp) 


3(2  ory4  4  cr^T2)* 

-3/2 
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J|2(x)  ♦  64  (2  ♦  4  £g)“2 

It  is  seen  from  the  above  results  that  the  values  of  )t \  and 
$  2  for  the  log-normal  distribution  are  functions  solely  of  fly*, 
whereas  for  the  root-normal  distribution  they  are  functions  de¬ 
pending  only  upon  ^ 2/  fl-y2.  Thus  it  would  appear  that  in  order 
to  test  for  normality  of  a  given  distribution  with  either  of  the 
above  distributions  as  alternatives  we  may  investigate  either 
|f  1  or  |f  g  of  the  population.  Geary  and  Pearson  [15]  found  that, 
except  for  very  large  samples,  (fa  was  difficult  to  investigate; 
thus  we  shall  now  propose  a  method  for  testing  whether  ^f |  -  0, 

Now  if  a  distribution  has  ■  0  then  it  is  symaetilcal  about  its 
mean,  and  therefore  a  criterion  that  tests  for  syumetry  will  at  the 
same  time  test  whether  -  0.  Moreover  it  would  seem  justifiable 


to  use  a  test  for  synmetry,  since  it  is  likely  that  the  F-test  in  the 
Analysis  of  Variance  would  be  more  affected  by  skewness  of  the 
sampled  distribution  than  by  possible  departure  from  normal  lcurtoaia* 
Let  us  assume  then  that  observations  have  been  drawn  randomly 


and  independently  and  are  classified  into  k  groups,  there  being  n^ 
observations  in  the  t-th  group.  It  should  be  noted  that  the  k  groups 
may  be  arranged  as  a  one-way  classification  or  according  to  a  more 
complicated  model.  Let  the  i-th  observation  in  the  t-th  group  be 


denoted  by  x^. 

In  addition  let  us  denote  the  mean  and  variance  of  the  distribu¬ 
tion  from  which  the  t-th  group  was  drawn  by  ^  ^  end  O^2  respectively, 
and  further  let  us  assume  that  all  the  distributions  are  continuous. 
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have  equal  Unite  (which  may  be  infinite)  and  have  a  third  moment  about 
their  mean  equal  to  the  common  value 

Then  the  null  hypothesis  that  we  really  wish  to  test  is  that  the 
are  normally  distributed!  but  it  is  decided  that  a  weaker  hypothesis 
will  be  tested ,  namely: 


H.  *  ^3  -  0 


...(2.4) 


against  the  alternative: 


H»  :  >  0  •  ...(2.5) 

2.3.  Test  procedure.  It  is  necessary  in  order  to  carry  out  the  following 
procedure  for  there  to  be  at  least  three  replications  in  at  least 
several  of  the  groups,  as  will  be  seen  below.  The  procedure  is: 

i)  Calculate  ,  the  mean  of  the  t-th  group  for  t  *  l,...,k. 

ii)  Consider  first  those  groups  for  which  n^  is  odd}  to  each 
group  assign  a  characteristic  random  variable  Y^,  such  that 

Tt  -  1  if  the  number  of  -  Xj..)  >  0  is  greater  than  j  nt 
Y^  •  0  otherwise.  ' 

iii)  In  each  of  the  groups  for  which  n^  is  even,  choose  at 
random  one  observation  (for  example  this  may  be  done  by  using  a 
table  of  random  numbers),  then  assign  to  each  of  these  groups 
such  that, 

Y^  ■  1  if  the  number  of  rmaaining  •  x^.)  >  0  is  greater 

than  j  -  1) 

Y^  ■  0  otherwise. 

57 


(Note:  s f t may  still  bs  the  as an  of  all  ob serrations.) 

It)  Calculate  v 

T- 

▼)  Rule:  Reject  H#  if  Y  <  c^a 

Since  the  Y^  are  independent,  and  P^Y^  “  1  |h.J-  ^  ,  the 
values  of  Ojj  a  may  be  obtained  from  tables  of.  the  Binomial 
Probability  distribution  such  as  [35]  or  [37]*  Suggested 
critical  values  are  listed  in  Table  16,  with  the  corresponding  values 
of  a,  the  probability  of  a  Type  I  error* 

It  should  be  noted  that  since  the  x^  all  have  continuous 
distributions,  P  ^  clover  wing  to  rounding  off  in 

practical  examples  it  sometimes  occurs  that  an  individual  observa¬ 
tion  is  equal  to  its  group  mean,  it  is  suggested  that  any  such  observa¬ 
tion  be  disregarded  when  assigning  the  Y^» 


TAELS  l£ 

CRITICAL  VALUES  FOR  Y 


k 

‘k,. 

a 

4 

0 

.0625 

5 

0 

.0313 

6 

0 

.0156 

7 

1 

.0625 

8 

1 

.0352 

9 

1 

*0195 

10 

2 

.0547 

11 

2 

.0327 

12 

2 

.0193 

Example*  As 

an  example  consider 

the  data  reproduced  in  Table  17, 

from  Table  5 

of  [293*  This  data 

represents  a  cross-classification 
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DATA  FOR  TWO-WAY  CROSS-CLASSIFICATION  ANALYSIS  OF  VARIANCE, 
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and  variances  are  at  the  ri{£it  of  each  cell,  the  numbers  (i,J)  above  each  cell  denote 


model  with  threw  rows  and  four  columns*  The  distribution  of  the  popula- 
tion  is  known,  and  is  in  fact  that  of  (2.2),  the  log-normal,  with 
a*y  -  1  ,  c  -  0,  and  with  the  quantity  ^  for  any  given  cell  made 
up  of  the  sub  of  a  row  effect,  a  column  effect,  and  an  interaction* 

In  order  to  carry  out  the  Y— test  we  notice  that  no  obser¬ 
vation  is  equal  to  its  cell  mean,  and  that  there  are  ten  observations 
in  each  cell.  Thus  it  is  necessary  to  ignore  one  observation  frost 
each  cell  when  assigning  the  value  of  Y^  to  the  cell.  (The  choice 
was  made  at  random.)  The  value  of  Yfc  and  the  ignored  observation  for 
each  cell  are  listed  in  Table  18  below* 


TABLE  IB 


Y-teat  on  data  of  Table  17 


Cell 

Yt 

(1,1) 

0 

(1,2) 

0 

(1,3) 

0 

(1,4) 

0 

(2,1) 

0 

(2,2) 

0 

(2,3) 

0 

(2,4) 

0 

(3,1) 

0 

(3,2) 

0 

(3,3) 

0 

(3,4) 

1 

12 

Thus  we  obtain  Y  "  JC, 

t-1 

Ignored  observation 

5 

5 

7 

3 

10 

5 

5 

9 

1 

1 

3 

9 


-  1  ,  which  from  Table  16  is 


seen  to  be  significant,  with  a  -  .02  approximately. 

The  null  hypothesis  of  symmetry  is  therefore  rejected,  and 
it  is  decided  to  apply  the  logarithmic  transformation  to  the  data 
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before  carrying  out  the  Analysis  of  Variance* 

2*4*  Further  work.  Having  decided  to  apply  to  the  data  a  transformation 
of  the  form 


y  -  log  (x  c)  *..(2*6) 

we  must  either  know  c  a  priori,  or  must  estimate  it  from  the  data* 

(-c  is  the  lower  limit  of  the  distributions*) 

Work  being  carried  out  at  present  includes  investigation 
of  methods  of  estimating  c* 

It  is  proposed  to  investigate  the  power  of  the  Y-test 
against  particular  forme  of  alternative  distribution,  for  various 
sample  sizes,  by  a  Monte  Carlo  procedure.  It  is  hoped  also  to 
investigate  the  possibility  of  being  able  to  make  a  test  on  the 
data  which  would  show  whether  the  logarithmic  or  the  square-root 
transformation  is  the  more  appropriate  for  the  particular  data  under 
consideration.  However,  a  preliminary  study  has  suggested  that 
residual  variation  may  frequently  obscure  which  form  of  transformation 
should  be  the  more  appropriate* 

Finally,  it  should  be  noted  that  the  Y-test  procedure 
requires  there  to  be  a  reasonable  number  of  cells  having  at  least 
three  observations  in  them.  ’;Jhen  these  conditions  are  not  satisfied 
some  alternative  procedure  is  required;  one  possibility  that  should 
be  investigated  is  that  of  always  transforming  to  standard  normal 
scores  when  the  observations  are  such  that  no  test  of  normality  may 
be  mad«* 
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3 


MISCELLANEOUS  TOPICS 


The  topics  of  the  following  sections  have  been  investigated, 

and  although  it  vaa  felt  that  each  one  justified  inclusion  in  this 

report,  it  was  decided  that  none  warranted  an  individual  main  section. 

* 

3.1.  Correlated  variables  and  transformations.  In  applying  transforma¬ 
tions,  it  has  almost  always  been  assumed  that  the  observations,  in 
the  sample  under  consideration,  are  independently  distributed  but  that 
their  common  distribution  in  not  normal.  Consequently,  a  transforma¬ 
tion  which  makes  them  "nearly  normal"  is  applied  before  the  Analysis 
of  Variance  of  the  data  is  perfomed.  It  is  not  uncommon,  however, 
that  the  observations  are  correlated  in  some  manner,  and  if  so  the 
transformations  for  this  type  of  variable  lead  to  decidedly  more 
difficult  problems.  In  these  circumstances,  it  may  be  of  interest 
to  transform  the  data  so  as  to  obtain  "nearly  normal"  correlated 
data  and  carry  on  the  analysis.  The  distribution  problems  for 
the  correlated  normal  variables  are  not  as  completely  solved  as  in 
the  independent  case.  (Cf.  for  example,  [7],  [16]  and  [19].)  The 
solutions  to  these  problems  are  needed,  however,  as  complementary  to, 
and  before  application  of,  the  transformations.  Same  of  these 
questions  have  been  considered  recently,  in  a  different  manner,  by 
Mizel  and  Rao  [25],  where  the  solution  to  the  problem  on  correlated 

* 

The  authors  are  grateful  to  V.  J.  Mizel  and  M.  M.  Rao,  Carnegie 
Institute  of  Technology,  Pittsburgh,  Pa.,  who  provided  this 
section.  Their  conclusions  were  included  in  a  joint  paper  [25], 
read  by  M.  K.  Rao  before  the  annual  meeting  of  the  American 
Mathematical  Society,  January  1961,  in  Washington,  D.  C. 
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normal  variables  is  deduced  from  a  somewhat  more  general  result.  That 
solution  in  the  present  context  will  be  briefly  described. 

Let  x  -  (xj,...,xn)  be  a  correlated  sample  from  a  normal 
distribution  whose  mean  vector  is  and  covariance  matrix  is  )T 
(both  of  n-th  order).  Suppose  the  breakdown  of  the  sum  of  squares 
is  done  analogous  to  the  one  in  the  case  of  independence.  Formally, 
this  means  that  a  quadratic  form  Q  **  xAx'  is  broken  up  into  several 
other  forms,  for  instance,  Qt  and  Qa*  Let  ■  xA^x* ,  so  that,  since 

Q  "  Qi  ♦  Qa  •••(3»1) 

we  have  A  -  Aj  ♦  A2.  If  Q  is  known  to  be  distributed  as  a  chi-square 
(central  or  not  makes  no  difference  in  what  follows),  with,  say,  nQ 
degrees  of  freedom,  then  the  problem  is  this:  for  what  types  of  At 
and  kz  are  Qf  and  Qa  independently  distributed  as  chi-square 
variables. 

It  may  be  seen  that  this  problem  is  related  to  the  results 
about  and  extending  Cochran's  theorem.  For  a  detailed  discussion 
and  same  extensions  of  the  classical  problem  reference  may  be  made 
to  [7]  and  [16].  In  a  recent  note  [19],  it  was  shown  that  if  Ag  is 
a  non-negative  matrix,  in  (3.1)  above,  and  if  Q  and  Qi  are  distributed 
in  chi-square  form,  then  Qs  Is  also  a  chi-square  variable.  At  this 
point,  it  is  natural  to  inquire  how  far  this  condition  may  be  relaxed, 
or  alternate  (and  possibly  weaker)  conditions  may  be  obtained.  (It 
is  known  that  a  quadratic  form  in  normal  variables,  such  as  Q,  is 
distributed  as  a  (non-central)  chi-square  variable  if  and  only  if 
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(AX)2  ■  AX#  i»«»  AX  is  idempotent.)  In  [4]*  this  problem  is 
characterized  as  follows.  If,  in  (3.1),  Q  and  Qt  are  each  distributed 
as  chi-square  variables  with  n.  and  ni  (<  n.)  degrees  of  freedom,  then 
in  order  that  Qa  be  distributed  as  a  chi-square  variable  with  na(*  n.wij) 
degrees  of  freedom  independently  of  Qt,  it  is  necessary  and  sufficient 


that  A2  and  X  be  commutative  and  that  AaX  be  positive  semi- 
definite  (i.e.  A2X"  2EAa  >0),  This  implies  that,  when  the  covari¬ 
ance  matrix  is  given  (it  should  be  known  in  all  the  problems  of  this 
type  or  the  experimenter  should  have  some  idea  about  it),  the  break¬ 
down  such  as  (3.1)  has  to  be  done  in  a  somewhat  more  careful  manner 
than  in  the  independence  case.  The  significance  of  this  is  then  that 
the  transformation  problem  for  the  correlated  observations  is  more 
difficult,  in  general,  and  consequently  more  care  should  be  given  in 

its  treatment  when  the  transformation  of  data  is  contemplated  at  all. 

« 

3.2.  Tests  on  the  mean  of  a  non-normal  variable.  Suppose  that  it  is  de¬ 
sired  to  test  the  hypothesis  H#  :  "^x  aSainat  the  alternative 

hypothesis  Hi  :  where  x  is  a  continuous  random  variable 

with  mean  and  variance  *  whose  distribution  function  is  not 
normal. 

Suppose  also  that  there  exists  a  transformation  y  -  (x  ♦  c)^ 
which  nonnaLizes  x  for  some  constants  c  and  p,  (y  is  distributed 
normally  with  mean  jAy  and  variance  <Ty2)* 

The  objects  of  this  investigation  are  (i)  to  determine 
whether  x  has  an  asymptotically  normal  distribution  (as  the  co¬ 
efficient  of  variation  approaches  zero),  so  that  the  usual  normal 


This  investigation  is  being  conducted  by  Kiss  C.  D.  Tramell. 
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theory  teste  may  be  applied  when  the  mean  is  large  relative  to 
the  standard  deviation  and  (ii)  to  find  a  method  for  estimating 
"best”  (in  some  sense  of  the  word)  values  of  e  and  p  for  a  given 
set  of  data* 

Por  p  ■  2k  +  1  ,  k  being  a  positive  integer,  the  trans- 
formation  y  ■  (x  ♦  c)p  is  continuous  and  monotone,  since 

g  -  (2k  ♦  1)  (x  ♦  c)2*  XO,x>-c  ...(3.2) 

Thus  a  theorem  of  Olds  and  Severo  [30],  p.  36,  is  applicable  to 
this  case.  This  theorem  gives  the  most  powerful  critical  region 
of  sise  a  for  testing  the  hypothesis  H#  *  jlx  ■  the 

alternative  Hi  *  ^x  ■  i  JAX  >  • 

The  case  f Or  p  -  j  was  considered  by  K.  M.  Rao,  [34] , 
with  <T^,a  fixed. 

It  is  hoped  that  results  will  be  obtained  for  the  case  where 
p  ■  —  ,  n  being  a  positive  integer. 

3*3  Some  results  on  the  mosiants  of  sample  range.  In  Part  I  of  this 

report  two  test  procedures  were  proposed,  namely,  the  log- variance 
test  and  the  log-range  test.  In  developing  the  theory  for  these 
teets  it  was  assumed  that  within  each  group  the  observations  were 
distributed  normally  and  independently.  Each  group  was  subdivided 
and,  according  to  which  test  was  being  used,  the  logarithm  was  cal¬ 
culated  either  of  the  variance  or  of  the  range  of  each  subgroup. 

An  Analysis  of  Variance  was  then  performed  on  these  quantities.  An 
investigation  was  initiated  to  consider  the  effect  upon  these  tests 
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if  the  observations  within  each  subgroup,  whilst  being  normally  dis¬ 
tributed,  were  not  independent.  Thus  it  was  assumed  that  the  joint 
distribution  of  the  variables  within  a  subgroup  was  multivariate 
normal,  with  not  all  the  correlations  equal  to  zero*  No  useful 
progress  has  been  made  on  this  topic  as  such;  however,  sane 
general  results  have  been  obtained  for  the  mean  and  variance  of 
sample  range.  It  was  decided  to  include  these  results  in  this 
report  since  they  are  of  interset  in  their  own  right* 

The  results  show  that  if  range  is  used  to  estimate  the 
population  standard  deviation,  assuming  the  variables  to  be  inde¬ 
pendent  when  they  are  in  fact  correlated,  then  the  estimate  will 
be  seriously  biased  even  if  the  correlation  between  the  s maple 
variables  is  small.  The  bias  is  increased  if  the  result  is  used  to 
estimate  the  standard  deviation  of  the  distribution  of  sample  means, 
still  assuming  independence* 

It  is  clear  that  these  results  might  be  used  to  modify 
the  standard  Statistical  Quality  Control  techniques  when  it  cannot 
be  assumed  that  the  observations  within  samples  are  independent r 
The  method  used  for  investigation  of  the  distribution  of 
sanple  range  when  the  variables  are  independent  (see  [17,  Vol.  1, 
p.  vi]  or  [32,  p*  43])  cannot  be  extended  to  the  situation  where 
the  variables  are  correlated*  This  is  clear  since,  given  the  value 
of  one  variate,  the  values  of  all  the  others  are  no  longer  independent 
of  it.  Thus  a  different  technique  oust  be  used*  The  following  dis¬ 
cussion  will  be  confined  to  obtaining  the  moments  of  snapls  range  in 
the  case  where  the  variates  may  be  correlated* 
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3.1.1.  A  general  expression  for  the  moments  of  sample  range.  A  sample  of 
size  n  in  drawn  from  a  continuous  distribution  such  that  the  joint 
density  function  of  the  cample  variates  xt,...,xn,  where  for  instance 
the  subscripts  denote  the  order  of  drawing,  is 

p(x,,...,xn)  • 

There  are  then  nl  possible  ways  in  which  the  subscripts  l,...,n 
may  occur  when  the  sample  is  ordered  with  respect  to  the  magnitude 
of  the  variates.  One  such  ordering  may  be  written 

*1  >  >  ...  >  Xp  >  xq  >  Xj  .  ...(3.3) 

Then  if  we  denote  the  sample  range  by  R,  the  s-th  moment  of  R  about 
zero  will  be  given  by  the  sum  of  nl  integrals  (one  for  each  possible 
ordering)  of  the  form: 

f+  oo  ex.  ex  ax 

VI  l  V  (*1  -  V  P(x,,...^<)dxtdx  ...dx  dx1 

w  •  00f*  €0  •••  V*  00  4 

...(3.4) 


which  may  be  written  as 


EC(x. 


i  “  lxi  >  xj  >  —  >  xp  >  xq  >  P(xi  >  xj  >  •••  >  xp  >  xq  > 


For  example,  let  us  denote  the  variates  of  a  sample  of  size 


67 


three  by  x,  y,  z,  whose  Joint  density  function  is 

p(x,y,z) 

and  let  us  further  denote  tho  sample  range  by  R(x,y,z)|  then 
E-([R(x,y,z)]8J  -  E[(x  -  y)8  |  x  >  z  >  y]  P$x  >  z  >  yj 

♦  E[(x  -  z)8  j  x  >  y  >  z]  P^x  >  y  >  z  | 

+  E((y  -  z)8|  y  >  x  >  z]  P^y>x>zJ 

♦  E[(y  -  x)8|y  >  z  >xj  P{y>z>x} 

♦  E[(z  -  x)8  |  z  >  y  >  x]  P^z>y>x| 

+  E[(z  -  y)8  |  *  >  x  >  y]  p{z>x>y}.  ...(3.5) 

Prom  this  expression  the  moments  of  range  may  be  calculated  if  the 
necessary  integrals  may  be  evaluated  exactly,  or  else  approximations 
to  the  moments  may  be  obtained  by  evaluating  the  integrals  by 
numerical  methods. 

An  interesting  result  is  obtained  for  the  mean  range  in  a 
sample  of  size  three  by  putting  s  -  1  in  (3.5).  If  we  then  write 

E(x-y)  -  ^CE(x-y)  E(x-mHt^r)]  ■  ^CE(x-y)  4-  E(x-z)  4  E(a-y)]  , 

...(3.6) 

it  may  readily  be  shown  that 

E[R(x,y,z)]  ■  ^S[R(x,y)]  4  E[R(y,s)]  4  E[R(z,x)]J  ...(3.7) 

where  R(x,y)  denotes  the  range  in  a  sample  of  size  two,  whose  variatf 
x,y  have  a  Joint  density  function 


P(x 


14  oo 

-  00 


p(x,y,z)  dz 


...(3.8) 
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Thus  if  the  mean  value  of  range  in  a  sample  of  size  two  is 
obtainable  then  it  is  always  possible  to  deduce  the  mean  range  for 
a  sample  of  size  three*  To  obtain  the  foraer  is  frequently  a  trivial 
matter*  Notet  This  result  does  not  depend  upon  the  form  of  p(xj 
3*3*2.  Moments  of  sample  range  when  the  distribution  is  multivariate  normal. 
In  the  following  it  will  be  assumed  that  the  sample  variates  xi*.*.xn 
have  jointly  a  multivariate  normal  distribution  with 

K(xi)-  0 
var  (x^)  - 

BCXixj)  -  f°r  i  /  J  .  ...(3.9) 

(i)  Sample  size  t  n  ■  2.  It  may  easily  be  shown  that 

"  71 1  *1  -  2  » i2*ir2  ♦  a-2]1'2  . ..  (3.10) 

var  (R)  -  (1  -  f)  IW  l  -  2*x*ri'rZ  *  °*2] 
and  if  ■  AT*  ■  <T  these  reduce  to 

E(R)  -  2  and  var  R  -  2(1  -  |)(l  -  pigHr*  ...(3.12) 

Now  if  R  is  used  to  ostlaate  V  *  assuming  pi  a  ■  0,  we  use  (see 
[32,  p.  46]) 

S’  ...(3.13) 
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Thus  if  there  is  in  fact  correlation 


a  „ 

Eftf)  -  O' 


...(3.14) 


Moreover,  if,  as  in  duality  Control  techniques,  sample  range 
is  used  to  estimate  the  standard  deviation  of  the  sample  mean,  assum¬ 
ing  independence,  the  following  would  be  obtained: 


...(3.15) 


Also  if  the  variance  of  the  range  is  estimated  on  the  assumption  of 


independence,  the  following  relationship  holds: 


var  (R|fp)  -  (1  -  p)  (var  [R|[p  -  0])  •  ...(3.16) 


(ii)  Sample  size:  n  ■  3 

It  is  immediately  obtained  from  (3.7)  and  (3.10)  that 
E(R)  -  “  2P12«"|0*2  ♦  O'*]1/2  ♦  C«2  “  2p23°“2T3  *  <r3^l/2 

♦  [*2  -  +  O'2]1/2  J  ...(3.17) 

It  is  possible  to  obtain  egressions  in  closed  form  for 

(a)  var  (R^n  -  3) 
and  (b)  E  (R|n  -  4) 
however,  very  heavy  algebra  is  involved. 

If  it  is  assumed  that  O*^  -  <Tg  ■  ,  and  (3*17)  is 
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used  to  estimate  0“-t  it  is  3een  that 


X- 


JT  C3  +  2p,„  +  2p_-  +  2p  ,] 


...(3.18) 


Some  values  of  this  ratio  are  given  in  Table  19. 

(iii)  The  case  for  general  values  of  n,  when  all  the  correlations 
are  equal,  has  been  considered  by  Hartley,  [18].  In  addition  the 
probability  integral  for  max  has  been  tabulated  by  Kudo,  [22], 
for  the  case  when  the  correlations  are  all  equal. 

Hartley  demonstrated  that  if  the  means,  Jk  ,  and  variances.  O'2* 
of  all  variates  are  equal,  and  if 


*  p  >  -l/n-l  »  i  +  3  , 

then  the  sample  range  is  exactly  distributed  as  the  range  in  a  sample 
of  n  independent  normal  variates  with  variance  tf2(l-p),  and  further, 
is  distributed  independently  of  mean  x  • 

Thus  it  is  seen  that 


e(r|p)  -  Ji^  e(r|p  -  0) 


and 


var  (P-|p)  -  (  1-p)  var  (R|p  -  0)  .  ...(3.19) 


Thus  it  may  be  shown  that 
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,  for  p  >  -l/n-*l 


♦  .£.20) 


The  effect  of  this  for  n  -  5  is  noted  as  follows: 

P  -  .05  X  -  .89 
p  -  .20  X  “  *67 
P  -  .375  X-  *5 


TABLE  19 
VALUES  OP  A  . 


p 

n  m2 

‘W'V*1 

n  -  3 

Pl2-P23-P|Pi3«P 

-1.0 

00 

1.633 

-.5 

1.732 

00 

2.041 

0 

1.000 

1.000 

1.000 

♦.2 

.816 

.756 

.811 

•A 

.655 

.577 

•642 

.5 

.577 

.500 

.561 

.6 

.500 

.426 

.486 

•8 

.333 

.277 

.315 

.9 

.229 

.189 

.215 

♦1.0 

e 

0 

0 

2 


Clearly  there  is  a  wide  field  here  both  for  further 
investigation  and  for  the  application  of  these  results.  Until 
methods  of  using  these  results  are  proposed,  the  results  should 
act  as  a  warning  to  those  who  use  the  sample  range  to  estimate 
population  parameters  without  considering  whether  not  the 
sample  variates  are  independent. 
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